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Abstract: We prove several new lower bounds for constant depth quantum circuits. The 
main result is that parity (and hence fanout) requires log depth circuits, when the circuits are 
Q ; composed of single qubit and arbitrary size Toffoli gates, and when they use only constantly 
qq ■ many ancillae. Under this constraint, this bound is close to optimal. In the case of a non- 
constant number a of ancillae, we give a tradeoff between a and the required depth, that 
results in a non-trivial lower bound for fanout when a = n l ~ olyl \ 
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^ : 1 Introduction 

CM 

There has been significant recent progress in understanding the power of constant depth 
quantum circuits. Such circuits are of considerable interest as the first quantum circuits will 
certainly be small circuits with limited gates and constant depth. Much of the progress in this 
area has been in showing that constant depth circuits are more powerful than their classical 
counterparts. However, these and other upper bounds seem to require the presence of a 
(reversible) quantum fanout gate. A fanout gate takes an arbitrary number of bits and fans 



out one of them by taking its XOR with each of the others. Here we consider the question of 
whether fanout gates are necessary for these upper bounds. We prove several lower bounds 
showing that fanout cannot be computed using only generalized (i.e., unbounded size) Toffoli 
and single qubit gates when the number of extra work bits (ancilla?) that the circuit uses is 
limited. 

Fanout gates have proved to be unexpectedly powerful. Moore |H] first observed that 
fanout gates and parity gates, in the presence of single qubit gates using ancillae, are 
equivalent up to depth 3. This was extended by Green et al. j2J: fanout is even equivalent 
to any M0D q function (for q > 2), which determines if the number of Is in the input is 
not divisible by q. Here the equivalence is again up to constant depth, but using 0(n) an- 
cillae. One may interpret this result by defining quantum circuit classes analogous to classical 
constant-depth circuit classes. For example, a reasonable analog of the classical unbounded 
fanin and fanout class AC is QAC° the class of constant depth quantum circuit families 
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composed of single qubit, generalized Toffoli, and fanout gates. (Here the subscript u wf" de- 
notes "with fanout.") Similarly one may define quantum analogs of ACC(g) (called QACC(g)) 
and ACC (called QACC). Thus the equivalence of fanout with MOD g implies that, for any 
q > 2, QAC° wf = QACC(g) = QACC. Contrast this with the fact that AC ^ ACC, and, 
for any distinct primes q,p, ACC(g) 7^ ACC(p) [ZUHJ. More recently, H0yer and Spalek [3^ 
have improved these results by proving these same QAC° j circuits can compute threshold 
functions. Thus QAC° y = QTC°, an even sharper contrast with the classical classes. Indeed, 
this result implies that we can approximate the quantum fast Fourier transform in constant 
depth using fanout. Thus the "quantum part" of Shor's renowned quantum factoring algo- 
rithm can be carried out with a quite simple, constant depth quantum circuit that uses the 
fanout operator. 

These results suggest the following question: Is fanout really necessary to do the quantum 
Fourier transform in constant depth? While so much can be "reduced" to fanout, it is far from 
clear how much can be reduced to fanm, even in what appears to be its weakest form (i.e., the 
generalized Toffoli gate). Although generalized Toffoli gates can involve just as many bits as 
fanout gates, they may be more feasible to implement and it is instructive to investigate their 
power in constant-depth circuits. Note that Cleve and Watrous proved that with only 
one and two qubit gates it is not possible to approximate the quantum Fourier transform in 
less than log depth, but no similar lower bounds against quantum circuits containing gates 
of unbounded size are known. 

Our main result, proved in Section is that one cannot compute parity (and hence 
fanout) with QAC° circuits (i.e., in constant depth, without fanout) using a constant number 
of ancillae. This is the first hard evidence that QAC° and QAC° j may be different, and that 
fanout may be necessary for all the upper bound results mentioned above (it certainly is 
if we can get by with only constantly many ancillae). The issue of the necessity of ancillae 
in quantum computations is a murky one. It is generally accepted that a limited number 
(polynomially many relative to the number of inputs) are needed. This seems reasonable 
as it allows polynomially extra space in which to carry out a computation. However, it is 
possible to approximate any unitary operator with a small set of universal gates without 
ancillae (although one apparently needs circuits of great depth and size in order to do so). 
Furthermore, to our knowledge, no systematic investigation into the absolute necessity of 
ancillae has been done. They play a crucial role in the present result, in which we find 
the lower bound to be difficult to obtain when more than sublinearly many ancillae are 
allowed. To help clarify this problem, we provide a proof (implicitly claimed, but omitted, 
in Cleve and Watrous) that quantum circuits with gates of bounded size must be of log 
depth to compute parity (and hence fanout) exactly. In particular, we carefully address the 
problem of including ancillae, and show that in this case the depth of the circuit must be 
log n to compute parity, no matter how many ancillae are used. This is given in Section El 
In Section HI we allow circuits to include Toffoli gates of unbounded size. It is easiest to 
see the log-depth lower bound in the case of zero ancillae, so this result is given first, in 
Theorem 14.31 We then explain how the proof yields a depth/ancillae trade-off, showing that 
with fewer ancillae one needs greater depth to compute fanout. 
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We end with some open questions. 



2 Preliminaries 

In this section we set down most of our notational conventions and the circuit elements we 
use. Some acquaintance with quantum computational complexity as described in jB] or jlj 
is assumed. 

The following notation and terminology will be convenient. Let 7i denote the 2- 
dimensional Hilbert spanned by the computational basis states |0), |1). Let TCi, . . . ,Ti. n 
be n copies of TC. By £>{i v .. ira } (or simply "£> n " when the set notation is clearly understood) 
we denote the 2 n - dimensional Hilbert space TCi ® • • • <S> 7~L n spanned by the usual set of com- 
putational basis states of the form \x\, . . . ,x n ), where each xi e {0, 1}. We also consider 
"quotient spaces of £>{i v .., n } over m bits," defined as B{i lt _,i m ) = <S> •■■ <E> T~Li m , where 
{ii, . . . , i m } C {1, . . . , n}, which obviously have dimension 2 m . A "state over a set of m bits" 
is a state in such a quotient space. A quantum gate G corresponds to a unitary operator 
(also denoted G) acting on some quotient space B^^^y of B n . We will say that G involves 
the bits ii,... ,i m . We will freely identify G with any "extension by the identity" that acts 
on a bigger quotient space Ba for any set of bits A D {i ly . . . ,i m }, that is, G can be identi- 
fied with the operator G ® /, where / is the identity on BA-{h,...,i m }- If we fix a state \^ m ) 
over m bits {ii, . . . , i m }, we are effectively restricting £>{i v .. in } to the 2 n_m -dimensional linear 
subspace \^ m ) ® B{i,...,n}-{ii,...,i m }- The space £>{i,...,n}-{u,...,i m } is referred to as the quotient 
space of £>{i,... in } complementary to \^ m ). 

A single-qubit gate is a 2x2 unitary matrix (e.g., acting in £>{i}). For example, the 
Hadamard gate H is the single-qubit gate, 
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A generalized Toffoli gate, which we refer to in this paper as simply a Toffoli gate T, trans- 
forms computational basis states as follows: 

T\xi, x n ) = \x\, x n , b © Aj =1 Xj) 

A generalized Z-gate, which we refer to as a Z-gate for brevity, has the following effect: 

Z\xi,...,x n ) = (-l)A<=i|a:i, ...,x n ). 

It is not hard to show that, T = HZH where the Hadamard gate H in this equation is 
applied to the target bit of T. Hence we may substitute Z-gates for T-gates in any circuit 
that allows Hadamards (which will be true throughout the paper). Z-gates are useful for our 
purposes since they do not permute computational basis states, and thus have no preferred 
target bit. 
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The fanout gate F and the parity gate P are defined, respectively, by 

F\xi, ...,x n ,b) = \b®xi,...,b®x n ,b), 

n 

T\X\,...,X n ,0/ \x±,...,x n , b ©0i,). 

i=i 

There is no obvious a priori relation between these operators, but as was observed by Moore, 
F is conjugate to P via an n + 1-fold tensor product of Hadamards applied to all the bits: 

p — fl®(n+l) pjj®{n+l) ^ 

Recall that Hadamard, phase, CNOT (Toffoli gates for n — 1), and 7r/8 gates are a 
universal set of gates in that any unitary operator can be approximated to an arbitrary 
degree of precision with them. Our lower bound techniques work against arbitrary sets 
of single-qubit gates combined with Z-gates, which is also a universal set by the above 
discussion. 

A quantum circuit is constructed out of layers. Each layer L is a tensor product of a 
certain fixed set of gates (in our main theorems, these will consist of single-qubit and Z- 
gates). A circuit is simply a (matrix) product of layers LiL 2 ■ ■ ■ L d . (Observe that "last" 
layer Ld is actually the one that is applied directly to the inputs, and L\ is the output 
layer.) The number of layers d is called the depth of C . A circuit C over n qubits is then a 
unitary operator in the 2 ra -dimensional Hilbert space Bu ^x. Clearly, C computes a unitary 
operator U exactly if for all computational basis states, C\x\, ...,x n ) = U\xi, ...,x n ). This is 
in general too restrictive, however. One must allow for the presence of "work bits," called 
ancilleE, that make extra space available in which to do a computation. In that case, in 
order to exactly compute the operator U we extend the Hilbert space in which C acts to the 
2 n+m -dimensional space spanned by computational basis states \xi, ...,x n ,ai, ...,a m ), where 
again Xi,ai G {0, 1}, the Oj serving as ancillae. Then we say that C cleanly computes U if, 
for any x u x n and y u y n , 

(yi, -,2/n,0, ...,0|C|xi, ...,x n ,0, ...,0) = (yi, ...,j/„,0, ...,0\(U ® I)\x 1: ...,x n ,0, ...,0), 

where I is the identity in the subspace that acts on the ancillae, and the number of 0s in each 
state above is m. That is, C does a clean computation if the ancillae begin and end all as 0s. 
We assume all of our circuits perform clean computations. This is a reasonable constraint, 
since only then is it easy to compose the circuits. 

Lastly, all circuits should be understood to be elements of an infinite family of circuits 
{C n \n > 0}, where C n is a quantum circuit for n qubits. 

3 Fanout Requires Log Depth with Bounded Size 
Gates 

It is easy to see that, by an obvious divide-and-conquer strategy, we can compute parity in 
depth logn using just CNOT gates and ancillae. In this section we prove this is optimal 



4 



Figure 1: Decomposition of the layers of the circuit C. 



for any bounded size multi-qubit gates, and furthermore that no number of ancillae help to 
reduce the depth of the circuit. 

Let C = L\ ■ ■ ■ Ld consist entirely of arbitrary two-qubit gates and single-qubit gates. 
(The extension to arbitrary, but fixed, size gates is straightforward.) Further suppose that 
M is an observable on a single qubit in the last layer. Let L[ denote the gate whose output 
M is measuring. L' x could be a two-qubit or a single-qubit gate. In either case, L\ — L' x 
where R\ is the tensor product of all the other gates in that layer, if any. More generally, 
we decompose layer i similarly, writing Li — L^® R4, where L' { is a transformation that acts 
on some subset of the bits, and Ri acts on the rest. 

Lemma 3.1. For each d, there are layers L[, L' d such that 

L\L\_ X ■ ■ ■ L\ML X ■ ■ ■ L d ^L d = L'lL'l^ ■ ■ ■ L'\ML'x ■ ■ ■ L' d -iL' d 

where, for each i, L\ acts on at most 2* bits. Furthermore, for each i, L\ acts on bits with 
indices in some set Si such that S d ~D S d ~i ~D ... D S±. 

Figure makes the notation a little clearer. Note that the input will, as usual, be on the 
left, but it doesn't enter the claim (or the following argument) at all. 

Proof: The proof of Lemma ETT1 is by induction on d. First consider d — 1. Then consider 
the operator V X ML\. By the observations above, we may write L\ = L[ ® Ri, where L[ is 
either a single or two-qubit gate. So, 

L\ML\ = (L'l ® R\)M(L[ ® RJ = L'lML'^ 

by virtue of the fact that M and R± commute. Since L\ only depends on < 2 qubits, this 
establishes the result for d = 1. 
Now suppose that we can write, 

44_! • • • L\MLx ■ ■ ■ L d ^L d = L'\L'\_ X ■ ■ ■ L'\ML\ ■ ■ ■ L' d -xL' d 



5 



where, for each i, L\ acts on at most 2 l bits. In particular, note that L' d acts on at most 2 d 
bits. Suppose that L' d acts on indices in the set Sd (where Sd has size < 2 d ). Now by the 
induction hypothesis, 

Ld+i-Ld, ' " ' L\ML\ ■ ■ ■ L d L d +i = L d+1 L' d ■ ■ ■ U\ML\ ■ ■ ■ L' d L d+ i, 

and S d 2 S d -i 2 ••• 2 S v 

The gates in L d involve at most the bits in Sd- Since the circuit only contains at most 
two-qubit gates, all the gates in Ld+i involving bits in Sd can act on at most 2 d+1 bits. Let 
the tensor product of these gates be denoted by L' d+1 , and Sd+i denote the set of bits on 
which L d+1 acts. Clearly Sd+i 12 Sd- Then for some tensor product of single and two-qubit 
gates Rd+i we may write L d+ i = L' d+l ® Rd+i- Since Rd+i acts on bits not in Sd+i, it 
commutes with all the L\ and M, which only act on bits inside Sd+i- Hence Rd+i "cancels 
out" and we have the desired relation. q 

Theorem 3.2. Let C be a quantum circuit on n inputs of depth d, consisting of single-qubit 
and two-qubit gates, with any number of ancillae that cleanly computes parity exactly. Then 
d > logn. If C computes fanout in the same way, then d > logn — 2. 

Proof: Let C = L\ ■ ■ ■ Ld as in Lemma IH. 11 Suppose C uses m ancillae, and that it cleanly 
computes the parity operator P in depth d < logn. It follows that for any xi, ...,x n ,b and 
any measurement operator M on the target bit, 

(x 1 ,...,x n ,6,0,..,0|C t MC|x 1 ,..,x fe ,0,...,0) = (xi,...,x n ,b\PMP\ Xi, x n , b). (2) 

By Lemma f3. 11 

C^MC = L\L\_ X ■ ■ ■ L\ML X ■ ■ ■ Ld-iLd = ■ ■ ■ L'{ML\ ■ ■ ■ L' d -iL' d , 

where the operator L\ ■ ■ ■ L'd acts on at most 2 d inputs. Since 2 d < n, there is an input on 
which that operator does not act. Hence the value on the left hand side of eq. (J2J) remains 
unchanged if we can flip some Xj. However, the outcome of the measurement on the parity 
gate on the right hand side depends on every input, which is a contradiction. 

The second assertion in the Theorem follows from eq. (JIJ. □ 

It is clear that if we have a family of circuits that use a fixed set of multi-qubit gates 
with arity independent of n, that a similar proof will work. Thus we have the following as a 
corollary of the proof of Theorem 13.21 

Corollary 3.3. Let C be a quantum circuit on n inputs of depth d, consisting of single- 
qubit and multi-qubit gates of size 0(1), with any number of ancillae, that cleanly computes 
parity, or fanout, exactly. Then d = Q(\ogn). 
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4 Parity Requires Log Depth with Few Ancillae 



In this section we treat circuits that contain Toffoli gates or, equivalently, Z-gates, of arbi- 
trary size (i.e., that can depend on n). The technique of the preceding section does not work 
in this case. This is because the large gates in general do not cancel, since they may not 
commute with the measurement operator M. 

To see how to proceed, it is useful to briefly consider classical circuits with similar con- 
straints. Suppose we have a classical circuit with NOT gates and unbounded fan-in AND 
and OR gates, but that we do not allow any fanout. Once inputs (or outputs of other gates) 
are used in either an AND or an OR gate, they can not be used again. It is obvious that 
if such a circuit has constant depth, it cannot compute such functions as parity. The AND 
and OR gates can be killed off by restricting a small set of inputs, resulting in a constant 
function, while parity depends on all the inputs. 

In the quantum case, it appears again that the only thing to do is to attempt to "kill 
off" the large Toffoli gates. However, the quantum case is much more subtle since we must 
face the fact that intermediate states are a superposition of computational basis states, 
and furthermore that the Z-gates, in combination with the single-qubit gates, may cause 
entanglement. 

As before, write C = LiL 2 - ■ ■ L d . Thus the circuit C transforms the state \^) to 
Li ■ ■ ■ L d \^f). We assume wlog that each layer L; is a tensor product of Z-gates and single- 
qubit gates. Further assume wlog that a specific bit (say, the n th bit) of C serves as the 
output or target bit (which eventually is supposed to agree with the output bit of a parity 
gate). 

Our main technical lemma is easiest to see in the case that C has no ancillas, which we 
assume until later in the section: 

Lemma 4.1. Let C be a circuit as described above, with no ancillae. Then for each 1 < k < 
d, there exists a state \^ k ) over at most 2 k bits such that for any state \R) in the quotient 
space of B n complementary to l^fc), the state LiL 2 ■ ■ • L k {\R) <8> |^fc)) has a in the target 
position of C. 

Proof: The proof is by induction on k. First let k — 1. There are two cases: 

1. In layer L 1: the target is the output of a single-qubit gate S. Then let the state 
l^i) = S^\0) over the n th bit. Now we may write L\ = L[ <g> S, where L[ acts on the 
quotient space TZ complementary to l^i). No matter what state \R) G TZ we choose over 
the bits {l,...,n-l}, it follows that Li(|i2)(8)|*i)) = (L , 1 | J R))®(,S|^i)) = (L'^R))® |0) 
has a in the n th position. 

2. In layer Li, the target is the output of a Z-gate. Write L\ = L[ <S>G, where G is this Z 
gate. In this case, we choose = |0) over the n th bit. Now G acts both on as 
well as the complementary quotient space TZ (via extension by the identity). But since 
G involves a bit that is (i.e., the n th bit), G is equivalent to the unit matrix in TZ. 
Hence for any state \R) G TZ, L^R) <g> |^i)) = (L[ ® G)(\R) <g> = (L[\R)) <g> |0) 
again has a in the n th position. 
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Figure 2: The sets K k and R k - A Z gate that involves bits in both sets is shown. 



Now suppose the assertion is true for k — 1 where k > 1. We will show that it remains true 
for k. Suppose the |\?fc-i) in the assertion is a state over the (at most) 2 fc_1 bits in the set 
K k -\. Let Rk-i denote the rest of the bits {1, . . . , n} — iffc-i- Thus l^fc-i) is a state in Bk^, 
and the quotient space complementary to l^fc-i) is BR k l , which for convenience we denote 
by TZk-i- We specify the state |^) as follows: Start with K k '■— K k _i and Rk := Rk-i- If a 
Z-gate G in Lk involves bits both in K k and in R k , we remove a single bit from i?^ on which 
G acts, add it to K k , declare the gate G killed, and remove G from further consideration. 
Continue until all such Z-gates have been killed. Since each bit in Kk-\ can be involved 
with at most one Z-gate in Lk, the number of bits added to Kk (and removed from Rk) in 
this process is at most 2 fc_1 . Let L k K ^ denote the gates in Lk that involve the bits in Kk, 
excluding the Z-gates that have been killed. Then finally, we define the state as the 
tensor product of L k \^k-i) with the state in which all the bits in Kk — Kk-\ are 0. 
Note that \^k) is a state over at most 2 • 2 fc_1 = 2 k bits, as seen in Figure El 
Let TZk denote the quotient space complementary to \^k)- Clearly, !Z k = &R k - Now let 
\R) be any state in 1Z k (equivalently, over the bits in R k ), and apply L k to \R) ® \^?k)- Let 

( Ft) 

L). denote the gates in L k acting in lZ k , again excluding the Z-gates that have been killed. 
Note that any Z-gate in layer Lk that involves bits in K k as well as Rk acts as the identity 
on lZk <E> |^fc), by the construction of \^k)- Thus we have eliminated these gates from L k 
without any loss of generality. Thus, 

L k (\R) ® |* fc » = (L[ R) ® L[ K) ){\R) ® |¥ fc » = (Li R) \R)) ® (Lf H»- 

Now L k K \^ k ) is the tensor product of \^k-i) with a number of |0) states. So we conclude 
that L k {\R) ® \^k)) is of the form \R!) <g> l^fc-i) for some state \R!) G 7Zk-i- Then, 

L X L 2 ■ ■ ■ L k -iL k {\R) <8> |* fc » = L X L 2 ■ ■ ■ L k _ x (\Rl) <8> 

By the induction hypothesis, the right hand side of the above equation has a target bit, 
which proves the lemma. q 
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Remark. With a bit more careful analysis, Lemma f4. II can be improved to the following: 

Lemma 4.2. Let C be a circuit as described above. Then for each 1 < k < d, there exists 
a state \^k) over at most 2 fc / 2 bits such that for any state \R) in the quotient space of B n 
complementary to \^k), the state LiL 2 ■ ■ ■ Lk(\R) ® 1^)) has a in the target position of C. 

The difference is that now |\&fc) is over only 2 fc / 2 bits instead of 2 h . Instead of giving a 
formal proof, we will just sketch the reasons for Lemma l4~2l When some bit (the i th bit, say) 
is moved from Rk to K k , it is set to the |0) state. Consider the gate G (if any) in L^+i that 
involves this bit. If G is a single-qubit gate, then no Z-gate is killed involving the i th bit, so 
no additional bit needs to be added to K k +i for the sake of the i th bit. If G is a Z-gate, then 
the i th bit alone is enough to kill G, since this bit is already 0. So again, no additional bit 
must be added to K k+ i to kill G. Thus k must increase by 2 for the size of K k to double. 
Note that we handled the base case of Lemma 14.11 this way, obtaining a state over 1 = 2° 
bits. 

Theorem 4.3. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates, 
and uses ancillae. If d < 21ogn, then C cannot compute P. 

Proof: Suppose C = P. Then for any input state, the target bit of C is iff the target 
bit of P is 0. By Lemma f4. 21 there exists a state |^) on at most 2 d l 2 < n bits such that, for 
any state \R) on the remaining n — 2 d l 2 bits, C(\R) <8> |^)) has a value for the target. First 
let \R) be the state with 0s in all n — 2 d l 2 positions (since n — 2 d l 2 > 0, such positions exist). 
Then P(\R) <8> \^f)) has a target. This is only possible if the state \^f) is in a quotient space 
of B n spanned by computational basis states in which an even number of the variables are 
1. Now change one of the bits of \R) from to 1. The target of C(\R) (g> \^)) still has the 
value 0, but the target of P(\R) <S> |^)) must change to 1, which contradicts the assumption 
that C = P. □ 

Since fanout and parity are equivalent up to depth 3 (with ancillae), we have immediately 
the following. 

Corollary 4.4. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates, 
and uses ancillae. Then, if d < 2 logn — 2, C cannot compute the fanout operation. 

We now consider the case in which our circuit has a non-zero number of ancillae. Firstly, 
it is clear that Lemmas 14.11 and 14.21 work if we set a target and all ancillae to at the same 
time. If there are a many ancillae, then we are setting a + 1 "outputs." The conclusion of 
the analogous Lemma for a ancillae would then be that the state |^) is over (a + l)2 d / 2 bits 
(since the number of "committed" bits doubles with each second layer, as in Lemma l4.2j) . 
These bits may include all the ancillae, and assuming that C does a clean computation, |\&) 
will be on the ancillae (since they must all start out as in order to return to their final 
value of 0). Therefore, if n > (a + l)2 d / 2 , the state \R) does not involve any of the ancillae 
and is thus free to take on any value. Thus if n > (a + l)2 d//2 , the output of C is insensitive 
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to changes in at least one of the inputs, and hence the circuit is defeated as before. Note we 
have a depth/ancillae trade-off as a result. We thus have the following corollary of the proof 
of Theorem 14.31 

Corollary 4.5. Let C be a circuit of depth d consisting of single-qubit gates and Z-gates. 
Then, if C cleanly computes the parity function with a ancillae, then d > 21og(n/(a + 1)). 

We conjecture that d must be at least 21ogn no matter what a is. 

We offer an alternative interpretation of our result that arose out of conversations with 
L. Longpre. Let us say that a quantum circuit C robustly computes a unitary operator U 
if C computes U cleanly and, in addition, if its output is insensitive to the inititial state of 
the ancillae. Thus the ancillae of C can start out in any state whatsoever; the circuit C is 
guaranteed to return the ancillae to that state in the end, and always gives the same answer. 
This of course puts a much stronger constraint on the circuit (since in the usual model we 
only insist on a clean computation when the ancillae are initialized to 0), but such circuits 
can be useful (e.g., see exercise 8.5 in Kitaev et al. 4j). It is not hard to see that in this 
case, if C consists only of single-qubit and Toffoli gates, then it must have depth \ogn to 
compute parity, regardless of the number of ancillae. 



5 Conclusions and Open Problems 

Following the line of earlier work of Green et al., H0yer and Spalek, and Cleve and Watrous 
PP, our main result gives an optimal, 0(log n) lower bound on the depth of Q AC- type circuits 
computing fanout, in the presence of limited (slightly sublinear) numbers of ancilae. It would 
clearly be desirable to extend our result to obtain the same conclusion when polynomially 
many (or an unlimited number of) ancillae are allowed, and thus to prove that QAC° ^ 

QACV 

The role of ancillae in quantum computation has not received much detailed attention. 
Prompted by our considerations here, there are several interesting questions that arise. One 
issue is the necessity of ancillae for specific quantum computations or classes of quantum 
computations. Is there a problem that can be done in constant depth with ancillae but which 
requires logn depth without ancillae? Similarly, are there computational problems for which 
logn depth is possible with ancillae but without ancillae, polynomial depth is needed? In 
general, how many ancillae are needed for specific problems? Is there a general tradeoff that 
can be proved between numbers of ancillae and circuit depth? 

While much has recently been learned concerning constant depth circuit classes, a few 
interesting questions still remain. It would be worthwhile to be able to distinguish between 
the power of quantum gates of unbounded arity. We have seen that Toffoli and Z gates 
(which are equivalent up to constant depth) are weaker than parity and fanout (which are 
equivalent not only to each other but also, for all intents and purposes, to other mod gates, 
threshold gates and the quantum Fourier transform). Are there other natural types of gates 
that lie between these two classes, or is every gate either equivalent, up to constant depth, 
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to either single qubit and CNOT gates, or to Toffoli gates, or to parity? It would also be of 
interest to characterize exactly what can be computed in constant depth using only single 
qubit and CNOT gates, as even very optimistically, this is the kind of circuit that might be 
built in the not too distant future. 
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